Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Memory containment for FIELD_LIST operands on x86 #65803

Merged
merged 5 commits into from
Mar 31, 2022

Conversation

SingleAccretion
Copy link
Contributor

@SingleAccretion SingleAccretion commented Feb 23, 2022

Enables broader containment for FIELD_LIST operands on x86.

The primary benefits come from containing loads that long decomposition has left (e. g. LCL_FLDs):

-       mov      ecx, dword ptr [ebp+08H]
-       mov      edx, dword ptr [ebp+0CH]
-       push     edx
-       push     ecx
+       push     dword ptr [ebp+0CH]
+       push     dword ptr [ebp+08H]

Additionally, there is some code simplification enabled by the use of OperandDesc in two inst_* functions with this change.

Some some good x86(-only) diffs. A few small regressions because of some worse RA choices.

Note: I think a couple stress runs for this change are warranted, the logic here is not straightforward.

@dotnet-issue-labeler dotnet-issue-labeler bot added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Feb 23, 2022
@ghost ghost added the community-contribution Indicates that the PR has been added by a community member label Feb 23, 2022
@ghost
Copy link

ghost commented Feb 23, 2022

Tagging subscribers to this area: @JulieLeeMSFT
See info in area-owners.md if you want to be subscribed.

Issue Details

Enables broader containment for FIELD_LIST operands on x86.

The primary benefits come from containing loads that long decomposition has left (e. g. LCL_FLDs):

-       mov      ecx, dword ptr [ebp+08H]
-       mov      edx, dword ptr [ebp+0CH]
-       push     edx
-       push     ecx
+       push     dword ptr [ebp+0CH]
+       push     dword ptr [ebp+08H]

Additionally, there is some code simplification enabled by the use of OperandDesc in two inst_* functions with this change.

We're expecting some some good x86 (only) diffs.

Author: SingleAccretion
Assignees: -
Labels:

area-CodeGen-coreclr

Milestone: -

@SingleAccretion SingleAccretion force-pushed the PutArgFldList-Containment branch 2 times, most recently from 4e01bc3 to b62428c Compare February 23, 2022 21:22
@SingleAccretion
Copy link
Contributor Author

Windows ARM64 build failures are #65818.

@dotnet/jit-contrib - some x86 CQ improvements to unblock further improvements.

@SingleAccretion SingleAccretion marked this pull request as ready for review February 24, 2022 16:10
@SingleAccretion SingleAccretion force-pushed the PutArgFldList-Containment branch from b62428c to 9b93e40 Compare February 25, 2022 20:56
@JulieLeeMSFT JulieLeeMSFT added this to the 7.0.0 milestone Mar 4, 2022
@SingleAccretion SingleAccretion force-pushed the PutArgFldList-Containment branch 2 times, most recently from 614d1e7 to d383575 Compare March 7, 2022 21:59
@BruceForstall
Copy link
Member

Some nice diffs here. Looking through the diffs, it's easy to see where we could do better (not related to this change), e.g.:

       mov      dword ptr [ebp-24H], edx
       mov      dword ptr [ebp-20H], eax
       push     dword ptr [ebp-20H]                 // could just use "push eax"
       push     dword ptr [ebp-24H]                 // could just use "push edx"

@BruceForstall
Copy link
Member

BruceForstall commented Mar 22, 2022

I found an interesting diff in spmi libraries_tests, System.Tests.TimeZoneInfoTests:NJulianRuleTest(System.String,int,int,bool):
base:

G_M59070_IG13:        ; gcrefRegs=00000048 {ebx esi}, byrefRegs=00000000 {}, byref
       push     dword ptr [ebp-38H]
       push     bword ptr [ebp-64H]
       ; byr arg push 1
       push     0
       push     0
       push     0
       mov      ecx, gword ptr [ebp-5CH]
       ; gcrRegs +[ecx]
       ; GC ptr vars -{V41 V45 V61 V100}
       call     [System.IO.Strategies.BufferedFileStreamStrategy:WriteSpan()]
       ; gcrRegs -[ecx]
       ; gcr arg pop 1

diff:

G_M59070_IG13:        ; gcrefRegs=00000048 {ebx esi}, byrefRegs=00000000 {}, byref
       push     dword ptr [ebp-38H]
       push     bword ptr [ebp-64H]
       ; byr arg push 1
       push     0
       push     0
UNDONE: record GCref push [cns]
       push     0
       ; gcr arg push 4
       mov      ecx, gword ptr [ebp-5CH]
       ; gcrRegs +[ecx]
       ; GC ptr vars -{V41 V45 V61 V100}
       call     [System.IO.Strategies.BufferedFileStreamStrategy:WriteSpan()]
       ; gcrRegs -[ecx]
       ; gcr arg pop 2

Can you explain this?

Looks like the code is pre-existing but perhaps wasn't executed before:

            // Did we push a GC ref value?
            if (id->idGCref())
            {
#ifdef DEBUG
                printf("UNDONE: record GCref push [cns]\n");
#endif
            }

Minimally, this needs to be:

            // Did we push a GC ref value?
            if (id->idGCref())
            {
                JITDUMP("UNDONE: record GCref push [cns]\n");
            }

but it seems like this should be an assert.

Also, it's worrisome that the GC info is changed with your changes.

(also saw this in System.Tests.TimeZoneInfoTests:IsDaylightSavingTime_CatamarcaMultiYearDaylightSavings())

@BruceForstall
Copy link
Member

One of your changes does retyping, so we end up with diffs like:

;  V17 tmp11        [V17,T04] (  3, 32   )    bool  ->  [ebp-2CH]   spill-single-def V11.value(offs=0x01) P-INDEP "field V11.value (fldOffset=0x1)"
...
       mov      cl, byte  ptr [ebp-2CH]
       mov      byte  ptr [esp+01H], cl
=>
       mov      ecx, dword ptr [ebp-2CH]
       mov      byte  ptr [esp+01H], cl

Why do we want this? It looks like it doesn't hurt, but...?

Copy link
Member

@BruceForstall BruceForstall left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few comments

src/coreclr/jit/emitxarch.cpp Show resolved Hide resolved
@ghost ghost added needs-author-action An issue or pull request that requires more info or actions from the author. and removed needs-author-action An issue or pull request that requires more info or actions from the author. labels Mar 22, 2022
@SingleAccretion
Copy link
Contributor Author

Why do we want this? It looks like it doesn't hurt, but...?

The premise is that we'd like to use push for varTypeIsSmall (non-candidate) LCL_VAR/LCL_FLD nodes. Retyping achieves that, and makes the logic for when that is legal centralized (LSRA and codegen can use a simple varTypeIsI test to determine whether push can be used).

I had two revisions of this change where that logic was flawed in one way or another, failing to deal properly with IND byte or FIELD<short>(LCL_FLD<byte>), and that's the code I came to after fixing the issues.

As you note, we may end up not using push after all, and then it's sort of pointless to retype (perhaps it could avoid partial register issues on some CPUs), but it should indeed not hurt things.

Can you explain this?

Yes, these were the diffs that worried me too... The cause is that the old inst_TT, when passed a GC-typed constant tree (here, null), called inst_IV, which in turn called GetEmitter()->emitIns_I(ins, EA_PTRSIZE, val);, losing GC-ness along the way. So with this change we're generating more (arguably redundant) GC info, but it should be strictly "safer". Perhaps we could retype nulls to nints in codegen for such situations, but that's a separate issue.

but it seems like this should be an assert.

What kind of assert do you have in mind?

@BruceForstall
Copy link
Member

The premise is that we'd like to use push for varTypeIsSmall (non-candidate) LCL_VAR/LCL_FLD nodes.

So are there cases where previously we would have something like:

       sub       esp, 4
       mov      cl, byte  ptr [ebp-2CH]
       mov      byte  ptr [esp], cl

and now we have something like:

       push     dword ptr [ebp-2CH]

? Assuming the target was a 4-byte slot?

What kind of assert do you have in mind?

If I add assert(!id->idGCref()); it never hits in the baseline. This indicates to me that this code never expected GC refs, and we need to do something, if this is really a GC ref.

But it also looks like there are two cases:

       push     0
UNDONE: record GCref push [cns]
       push     0
       ; gcr arg push 4

why does the second zero here get a GC update but the first goes through the code path that doesn't?

@SingleAccretion
Copy link
Contributor Author

So are there cases where previously we would have something like ... and now we have something like

I don't believe so, the optimization is for keeping parity with the previous implementation that did this effectively implicitly:

switch (fieldNode->OperGet())
{
    case GT_LCL_VAR:
        // Notably this doesn't handle the FIELD<short>(fieldNode<byte>) case correctly,
        // but that was ok because only very specific promotion-produced IR shapes were expected here.
        inst_TT(INS_push, fieldNode, 0, 0, emitActualTypeSize(fieldNode->TypeGet())); // Note the "emitActualTypeSize".
        break;

There are basically three nodes we can see on this path due to the aforementioned decomposition pessimization (that I also would like to remove at some point): LCL_VAR<field type> | CNS_INT<INT> | <anything INT-sized>, the scenario that was regressing without the optimization is a promoted small field of an address-exposed struct.

why does the second zero here get a GC update but the first goes through the code path that doesn't?

If we are looking at the same SPMI diff for NJulianRuleTest, then the diff is that these are for the same argument, I believe:

 IN006f: 00018C push     0
 IN0070: 00018E push     0
+UNDONE: record GCref push [cns]
 IN0071: 000190 push     0
+       ; gcr arg push 4
 IN0072: 000192 mov      ecx, gword ptr [ebp-5CH]
        ; gcrRegs +[ecx]
        ; GC ptr vars -{V41 V45 V61 V100}
 IN0073: 000195 call     System.IO.Strategies.BufferedFileStreamStrategy:WriteSpan()
        ; gcrRegs -[ecx]
-       ; gcr arg pop 1
+       ; gcr arg pop 2

 02    F0 BF ...| 017C        reg EAX becoming dead
 0E    BF 8A ...| 0182        reg ECX becoming dead
 F0 BF 8A    ...| 018C        push ptr  1 (iptr)
-F0 49    C8 ...| 0195        reg ECX becoming live
+A6    0D D0 ...| 0192        push ptr  4
+4B    D0 4B ...| 0195        reg ECX becoming live
 0D    4B F2 ...| 019A        reg ECX becoming dead
-C8    F2 45 ...| 019A        pop  1 ptrs
+D0    F2 45 ...| 019A        pop  2 ptrs
 4B    45 F1 ...| 019D        reg ECX becoming live
 F2 45    05 ...| 01BA        reg EAX becoming live
 F1 05    40 ...| 01CF        reg EAX becoming dead

This indicates to me that this code never expected GC refs, and we need to do something, if this is really a GC ref.

My understanding of the GC reporting for all push/pops is that it already happens via the emitOutputInstr -> emitStackPush -> emitStackPushLargeStk -> gcRegPtrAllocDsc call chain.

@BruceForstall
Copy link
Member

Maybe your new inst_TT should strip the GC info the same way the old code was (implicitly) doing it, and assert that a GC constant is null when doing so.

e.g.,

        case OperandKind::Imm:
        {
            // The only gc/byref allowed as an immediate is zero (null). Strip the GC-ness when generating the code.
            emitAttr attr = op1Desc.GetEmitAttrForImmediate(size);
            assert(!EA_IS_GCREF_OR_BYREF(attr) || (!EA_IS_RELOC(attr) && (op1Desc.GetImmediate() == 0)));
            emit->emitIns_I(ins, EA_SIZE(attr), op1Desc.GetImmediate());
            break;
        }

or, if you think it's ok to generate this new GC info, then replace:

            // Did we push a GC ref value?
            if (id->idGCref())
            {
#ifdef DEBUG
                printf("UNDONE: record GCref push [cns]\n");
#endif
            }

which seems unnecessary, with an assert:

            // We only expect zero gc/byref (non-relocatable) constant.
            assert(!id->idGCref() || (!id->idIsCnsReloc() && (val == 0)));

Comments?

@SingleAccretion
Copy link
Contributor Author

Comments?

I would like to understand - what's the reason for expecting only nulls here? I think we can get (and should handle) arbitrary constants, byrefs especially. E. g. here is a construction that would hit the val != 0 case (there could be others):

[MethodImpl(MethodImplOptions.NoInlining)]
private static void Problem(ref int addrSrc)
{
    var addr = MemoryMarshal.CreateSpan(ref addrSrc, 1);
    ref int constAddr = ref *(int*)100;  // Just a placeholder, it could be address of a real (pinned) GC object

    if (Unsafe.AreSame(ref MemoryMarshal.GetReference(addr), ref constAddr))
    {
        CallForSpan(addr);
    }
}

[MethodImpl(MethodImplOptions.NoInlining)]
static void CallForSpan<T>(Span<T> value) { }

@BruceForstall
Copy link
Member

BruceForstall commented Mar 23, 2022

I would like to understand - what's the reason for expecting only nulls here?

I was just being conservative -- I didn't expect anything else. How could you get a non-null constant byref? I would think your example would be illegal. I would think it would have to be pinned memory if it was in the GC heap, at which point it could just be an unmanaged pointer, but it couldn't be constant anyway.

@SingleAccretion
Copy link
Contributor Author

SingleAccretion commented Mar 23, 2022

How could you get a non-null constant byref?

Well, we are sort of defining that for ourselves at this point, but in general the compiler does expect to see non-null byrefs (VN handles them at the very least), and we had bugs caused by "only null byrefs exist"-like assumptions.

It is true that we don't have that many sources of them right now, but it also the case that with the Unsafe.NullRef and derivatives I don't think we can really say that it's illegal at this point.

I would think it would have to be pinned memory if it was in the GC heap, at which point it could just be an unmanaged pointer, but it couldn't be constant anyway.

Ahh, but it could. Say it was a pointer into an array allocated with GC.AllocateArray(length, pinned: true), that was then saved into a static readonly, which would in turn be imported as a constant. Or that the creator of the code was actually using Reflection.Emit and "burned-in" the address of an object.

We also have the curious case of collectible assemblies which can be kept alive by RVAs-turned-constant-byrefs pointing into them.

@BruceForstall
Copy link
Member

Ok, you've convinced me there are weird cases, so we don't need to assert that the constant is null.

But it seems it's still safe to remove the GC reporting on it, since obviously the constant can't be updated by the GC.

@SingleAccretion
Copy link
Contributor Author

But it seems it's still safe to remove the GC reporting on it, since obviously the constant can't be updated by the GC.

It cannot be updated, but it still influences what things are reported as roots.

Modified example from earlier
private static void Main()
{
    Problem(ref *_objAddr);
}

[ModuleInitializer]
public static void Init() => RuntimeHelpers.RunClassConstructor(typeof(Program).TypeHandle);

[MethodImpl(MethodImplOptions.NoInlining | MethodImplOptions.AggressiveOptimization)]
private static int Problem(ref int addrSrc)
{
    var addr = MemoryMarshal.CreateSpan(ref addrSrc, 1);
    ref int constAddr = ref *_objAddr;

    if (Unsafe.AreSame(ref MemoryMarshal.GetReference(addr), ref constAddr))
    {
        CallForProblemWithPushes(addr, RvrsRt());
    }

    int i = 0;
    for (; i < 10; i++) { }

    return i;
}

private static readonly int* _objAddr = ConstructObjAddrForPushes();
private static object _objRt;
private static WeakReference _objFlg;

private static int* ConstructObjAddrForPushes()
{
    var obj = GC.AllocateArray<int>(1, pinned: true);
    fixed (int* addr = obj)
    {
        _objRt = obj;
        _objFlg = new WeakReference(obj);
        return addr;
    }
}

[MethodImpl(MethodImplOptions.NoInlining)]
private static long RvrsRt()
{
    Console.WriteLine(_objFlg.IsAlive);
    _objRt = null;
    return 0;
}

[MethodImpl(MethodImplOptions.NoInlining)]
static ref int CallForProblemWithPushes(Span<int> objAddr, long use)
{
    Console.WriteLine(_objFlg.IsAlive);
    return ref objAddr[0];
}

Running under GCStress=0xC, TieredCompilation=0, with this change we will see:

UNDONE: record GCref push [cns]
True
True

If we strip the GC-ness in emitOutputIV:

             // Did we push a GC ref value?
             if (id->idGCref())
             {
 #ifdef DEBUG
+                id->idGCref(GCT_NONE);
                 printf("UNDONE: record GCref push [cns]\n");
 #endif
            }

We instead get a VM assert about a corrupt ref:

UNDONE: record GCref push [cns]
True

Assert failure(PID 6552 [0x00001998], Thread: 25516 [0x63ac]): SanityCheck()

CORECLR! MethodTable::Validate + 0x31 (0x5d836300)
CORECLR! Object::ValidateInner + 0x134 (0x5d840ea4)
CORECLR! Object::Validate + 0xAF (0x5d840d1f)
CORECLR! WKS::GCHeap::Promote + 0x77 (0x5dbc3fd7)
CORECLR! ByRefPointerOffsetsReporter::Report + 0x1C (0x5d876b60)
CORECLR! ByRefPointerOffsetsReporter::Find + 0x65 (0x5d870715)
CORECLR! ByRefPointerOffsetsReporter::Find + 0xDC (0x5d87078c)
CORECLR! ReportPointersFromValueType + 0x3B (0x5d876ba0)
CORECLR! ReportPointersFromValueTypeArg + 0x36 (0x5d876c5b)
CORECLR! MetaSig::GcScanRoots + 0x263 (0x5d870a75)
    File: C:\Users\Accretion\source\dotnet\runtime\src\coreclr\vm\methodtable.cpp Line: 8046
    Image: C:\Users\Accretion\source\dotnet\build\CustomCoreRoot_x86\corerun_x86.exe

(Presumably, when it walks the stack inside CallForProblemWithPushes)

This is "sort of" expected, you can get similar "GC holes" in other ways, because the compiler doesn't model shortening of lifetimes as a side-effect (indeed: constant propagation of byrefs that lead to this in the first place would be illegal if we modelled shortening of lifetimes as a side-effect).

That said, it does not feel right to me to "take advantage" of this being (implicitly) allowed in codegen/emitter specifically that should ideally work in such a way that "lifetimes in" are the same as "lifetimes out".

a) It is clearer and cheaper to directly use the emitter anyway.
b) "inst_RV_TT" will be made XARCH-only in an upcoming change.
a) Make them XARCH-only.
b) Rewrite them using "OperandDesc".
Enable broader containment for FIELD_LIST operands on x86.

The primary benefits come from containing loads that long
decomposition has left (e. g. LCL_FLDs):

```diff
-       mov      ecx, dword ptr [ebp+08H]
-       mov      edx, dword ptr [ebp+0CH]
-       push     edx
-       push     ecx
+       push     dword ptr [ebp+0CH]
+       push     dword ptr [ebp+08H]
```
Insert an assert instead about what we expect and handle GCInfo-wise.
@SingleAccretion SingleAccretion force-pushed the PutArgFldList-Containment branch from 19ab275 to d3d5326 Compare March 30, 2022 20:41
Copy link
Member

@BruceForstall BruceForstall left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for all the explanations and updates.

@BruceForstall BruceForstall merged commit e402d8e into dotnet:main Mar 31, 2022
@SingleAccretion SingleAccretion deleted the PutArgFldList-Containment branch March 31, 2022 21:11
@kunalspathak
Copy link
Member

Another windows/x86 regression in #68433

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI community-contribution Indicates that the PR has been added by a community member
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants